Tech Arsenal 1

home *** CD-ROM | disk | FTP | other *** search

/ Tech Arsenal 1 / Tech Arsenal (Arsenal Computer).ISO / tek-02 / tsptp.zip / RESULTS < prev next >

Wrap

Text File | 1993-04-09 | 25KB | 515 lines

Benchmarking Pascal Compilers: TopSpeed v3.02 v Turbo v6.0 ========================================================== In response to messages on the BBS suggesting that TopSpeed Pascal may generate slower code than Turbo Pascal, I have prepared a number of standard benchmarks to test these claims. Benchmarks need careful interpretation and the performance of a compiler in a particular benchmark does not necessarily imply anything about the performance in a real world application. Most benchmarks are very small programs that test a single aspect of a compilers performance, typically in a very limited fashion. Modern optimising compilers may perform very differently in large real world applications than in trivial benchmarks. The following text presents the results of my benchmarks together with a brief description of the benchmark program and an interpretation of the results. The benchmarks themselves are included with this paper. The timing mechanism for these benchmarks is a little unusual and requires some explanation. Each benchmark is run in a loop, this naturally incurrs some overhead which must be accounted for. In order to calculate the overhead a loop which calls a Dummy procedure is executed for 1 second. The number of completed loops is recorded in NullLoops and the time taken is recorded in NullTime. Then the actual benchmark is run in a loop which invokes the benchmark AND the Dummy procedure. This loop is executed for approximately 1 minute. The number of completed loops is recorded in BenchLoops and the time taken is recorded in BenchTime. Now the loop overhead for the latter loop is calculated using the expression: LoopOverhead := (NullTime/NullLoops)*BenchLoops. The result is subracted from BenchTime to give TotalTime, and LoopsPerSecond is calculated from this as BenchLoops/TotalTime. Thus the greater the value of LoopsPerSecond the faster the benchmark. NOTE: The NOx87 benchmarks were run on a 25MHz 386 with no coprocessor. The Turbo Pascal REAL type uses a non-standard 6-byte representation, this is the only floating point type supported by the Turbo Emulator. The TopSpeed Pascal REAL type uses an 8-byte representation, though the TopSpeed Emulator also supports 4 and 10-byte floating point types with 10 byte intermediate values. All these types are designed for compatibility with the formats used by the Intel coprocessors. Because of the difference in representation it is reasonable to expect TopSpeed Pascal floating point programs to acheive better precision at the cost of some speed. Also the TopSpeed INTEGER type uses a 4-byte representation whereas Turbo Pascal uses a 2-byte representation. The EMU and x87 benchmarks were run on a 20MHz 486DX. You should be careful when comparing results from different machines. For comparative purposes the benchmarks use the types MyReal and MyInt which are defined as follows: TopSpeed Turbo ======== ===== MyReal (EMU & NOx87) REAL (8-byte) REAL (6-byte) MyReal (x87) REAL (8-byte) DOUBLE (8-byte) MyInt INTEGER (4-byte) LONGINT (4-byte) x87 results are not shown for non-floating point benchmarks, instead the x87 collumn contains "N/A". The Turbo Pascal programs were compiled from the command line using the following parameters: (EMU & NOx87) /B /$A+ /$D- /$E+ /$N- /$G+ /$I- /$L- /$R- /$S- /$V- (x87) /B /$A+ /$D- /$E- /$N+ /$G+ /$I- /$L- /$R- /$S- /$V- The TopSpeed Pascal benchmarks were compiled using the following .PR files: (EMU & NOx87) #system auto exe #model small #pragma optimize(cpu=>286) #compile %main #link %prjname (x87) #system auto exe #model small #pragma optimize(cpu=>286, copro=>287) #compile %main #link %prjname Both compilers wer instructed to produce code optimised for the 286 with no run-time checks, optimised for speed. Ackermann Benchmark =================== The Ackermann Benchmark is a simple recursive benchmark for function call overhead. TopSpeed Pascal passes parameters in registers, unlike most other compilers, and can be expected to outperform other compilers in this benchmark. TopSpeed Turbo NOx87 EMU x87 NOx87 EMU x87 =================== =================== NullTime : 1.04 1.05 N/A 1.04 1.04 N/A BenchTime : 60.15 60.96 61.30 60.58 Null loops : 3276 2506 3648 2674 Bench loops : 63 57 45 35 LoopOverhead : 0.02 0.02 0.01 0.01 TotalTime : 60.13 60.94 61.29 60.57 Loops per second : 1.05 0.94 0.73 0.58 TopSpeed Pascal comes out as the clear winner in this benchmark illustrating that the register parameter passing can have a marked effect. The effect is greatest where there are a large number of calls to small functions or procedures taking a number of parameters. Dhrystone Benchmark (March 84), Version Pascal / 2 ================================================== This is a translation of the classic synthetic benchmark by Reinhold Weicker. The Dhrystone was written to contain a sequence of statements of different types that would closely match the proportions of these statements found in a large sample of real programs. It is generally thought that this benchmark should provide a good indication of a compilers performance in real world applications. However many modern optimising compilers have been written to do well in benchmarks such as the Dhrystone and so give misleading results. On the other hand careful analysis of the resulting code can reveal a compilers weaknesses and strengths. There are no floating point expressions in the dhrystone making it a good test of the code generation capabilities of the compiler. The dhrystone is particularly sensitive to string or array handling optimisations. TopSpeed Turbo NOx87 EMU x87 NOx87 EMU x87 =================== =================== NullTime : 1.04 1.05 N/A 1.04 1.05 N/A BenchTime : 60.03 60.03 60.03 60.03 Null loops : 3469 2569 3696 2746 Bench loops : 26653 18518 14654 13304 LoopOverhead : 7.99 7.57 4.12 5.09 TotalTime : 52.04 52.46 55.91 54.94 Loops per second : 512.17 352.98 262.12 242.14 TopSpeed Pascal performs almost twice as many dhrystone loops as Turbo. This suggests that the optimisation performed by TopSpeed Pascal is giving it an edge. FBench Benchmark ================ This benchmark uses a complete optical ray-tracing algorithm and provides a good indication of a compilers floating point performance and accuracy. The benchmark can be very sensitive to the efficiency of the trigonometric functions in the run-time library. The benchmark is also very sensitive to errors in the calculations, though these results aren't displayed here since both compilers fell within the tolerances of the benchmark. TopSpeed Turbo NOx87 EMU x87 NOx87 EMU x87 =================== =================== NullTime : 1.04 1.04 1.04 1.05 1.04 1.04 BenchTime : 60.09 60.03 60.04 60.03 60.03 60.03 Null loops : 3346 2519 2518 3665 2673 2757 Bench loops : 383 15375 15997 1159 1280 8538 LoopOverhead : 0.12 6.35 6.61 0.33 0.50 3.22 TotalTime : 59.97 53.68 53.43 59.70 59.53 56.81 Loops per second : 6.39 286.41 299.39 19.41 21.50 150.29 Turbo Pascal does exceptionally well in this benchmark where the emulator is used. This is probably due to fact that Turbo Pascal uses a 6-byte representation for REALs whereas TopSpeed Pascal uses an 8-byte representation with a 10-byte internal representation within the emulator. However TopSpeed Pascal does very much better when a coprocessor is present. TopSpeed Pascal drives the chip in 'open mode' which tends to result in exceptionally fast floating point code. NOTE: Turbo Pascal is still using it's internal routines in EMU mode, rather than using the chip. This may well be my fault, although it may just be failing to detect the onboard coprocessor. Fibonacci Benchmark =================== The Fibonnaci benchmark is similar to the Ackermann benchmark, it is a highly recursive benchmark useful for testing function call overhead. TopSpeed Turbo NOx87 EMU x87 NOx87 EMU x87 =================== =================== NullTime : 1.04 1.04 N/A 1.04 1.04 N/A BenchTime : 60.20 61.85 60.64 60.25 Null loops : 3377 2480 3778 2637 Bench loops : 24 24 21 20 LoopOverhead : 0.01 0.01 0.01 0.01 TotalTime : 60.19 61.84 60.63 60.24 Loops per second : 0.40 0.39 0.35 0.33 TopSpeed Pascal comes out as the clear winner in this benchmark illustrating that the register parameter passing can have a marked effect. The effect is greatest where there are a large number of calls to small functions or procedures taking a number of parameters. Float Benchmark =============== The Float Benchmark is a trivial floating point benchmark. This benchmark has been popular for benchmarking C compilers and often gives some indication of the efficiency of floating point expressions. However the benchmark is prone to being optimised almost out of existence by a clever optimiser. It is worth bearing in mind that trivial benchmarks can be prone to being optimised out of existence, but real world applications don't contain code of this nature unless they are poorly written. A compiler that is clever enough to spot these cases is not necessarily better on a larger scale. Many compiler implementors would rather write a code generator that devotes its efforts to performing useful optimisations on realistic code than correcting a programs design flaws. TopSpeed Turbo NOx87 EMU x87 NOx87 EMU x87 =================== =================== NullTime : 1.05 1.04 1.05 1.04 1.05 1.05 BenchTime : 66.24 60.09 60.03 66.46 61.90 60.36 Null loops : 3321 2522 2472 3724 2695 2778 Bench loops : 9 342 352 8 8 133 LoopOverhead : 0.00 0.14 0.15 0.00 0.00 0.05 TotalTime : 66.24 59.95 59.88 66.46 61.90 60.31 Loops per second : 0.14 5.70 5.88 0.12 0.13 2.21 TopSpeed demonstrates a slight advantage in this benchmark under the emulator, suggesting a more efficient handling of floating point expressions despite the larger representation. When the coprocessor is in use TopSpeeds advantage doubles. Gamm Benchmark ============== The GAMM benchmark is a floating point benchmark that provides an indication to the efficiency of floating point expressions. Unlike the Float benchmark it is non-trivial and not subject to over-optimisation. TopSpeed Turbo NOx87 EMU x87 NOx87 EMU x87 =================== =================== NullTime : 1.04 1.04 1.04 1.04 1.04 1.05 BenchTime : 60.15 60.03 60.04 60.04 60.03 60.03 Null loops : 3365 2547 2577 3800 2752 2737 Bench loops : 192 10690 9031 959 1084 5928 LoopOverhead : 0.06 4.36 3.64 0.26 0.41 2.27 TotalTime : 60.09 55.67 56.40 59.78 59.62 57.76 Loops per second : 3.20 192.04 160.14 16.04 18.18 102.64 Turbo Pascal comes out the winner by a mile in this one under the emulator. When the coprocessor is in use TopSpeed Pascal is the clear winner. This benchmark again suggests that Turbo Pascal's internal floating point representation gives it a clear advantage. IntMath Benchmark ================= The IntMath benchmark is a trivial benchmark that illustrates the efficiency of integer expressions. It is unlikely to be over-optimised and so should provide a pretty good idea of the compilers capabilities in a very particular area. TopSpeed Turbo NOx87 EMU x87 NOx87 EMU x87 =================== =================== NullTime : 1.04 1.04 N/A 1.04 1.05 N/A BenchTime : 60.09 60.04 61.74 61.02 Null loops : 3384 2505 3736 2697 Bench loops : 809 846 30 46 LoopOverhead : 0.25 0.35 0.01 0.02 TotalTime : 59.84 59.69 61.73 61.00 Loops per second : 13.52 14.17 0.49 0.75 Turbo Pascal executes this benchmark extremely slowly, suggesting thet TopSpeed Pascals optimisation is having a great effect. Bear in mind however that TopSpeed Pascal is using its natural INTEGER (4-byte) type whereas Turbo Pascal is using LONGINTs (4-byte). The 4-byte representation is not as efficient as Turbo Pascals 2-byte INTEGER but it is necessary to use it for a fair comparison with TopSpeed. RealMath Benchmark ================== The RealMath banchmark is a trivial benchmark that illustrates the efficiency of floating point expressions. It is unlikely to be over-optimised and so should provide a pretty good idea of the compilers capabilities in a very particular area. TopSpeed Turbo NOx87 EMU x87 NOx87 EMU x87 =================== =================== NullTime : 1.05 1.04 1.04 1.05 1.04 1.04 BenchTime : 60.03 60.04 60.03 60.03 60.15 60.04 Null loops : 3417 2510 2465 3731 2774 3779 Bench loops : 580 31335 32934 404 441 7654 LoopOverhead : 0.18 12.98 13.90 0.11 0.17 2.11 TotalTime : 59.85 47.06 46.13 59.92 59.98 57.93 Loops per second : 9.69 665.90 713.86 6.74 7.35 132.12 TopSpeed Pascal outperforms Tubo in this benchmark, this result appears to contradict the result obtained from the GAMM. Savage Benchmark ================ The Savage Benchmark illustrates the efficiency of the compilers trigonometric functions. TopSpeed Turbo NOx87 EMU x87 NOx87 EMU x87 =================== =================== NullTime : 1.04 1.05 1.04 1.05 1.04 1.04 BenchTime : 61.30 60.25 60.03 64.53 63.71 60.42 Null loops : 3419 2467 2555 3759 2717 3817 Bench loops : 4 183 188 10 11 76 LoopOverhead : 0.00 0.08 0.08 0.00 0.00 0.02 TotalTime : 61.30 60.17 59.95 64.53 63.71 60.40 Loops per second : 0.07 3.04 3.14 0.15 0.17 1.26 Turbo Pascal does well in this benchmark under the emulator, again suggesting that the non-standard 6-byte REALs are giving it an edge. Again TopSpeed wins under the coprocessor. Sieve Benchmark =============== The Sieve is a classic benchmark that illustrates the efficiency of array indexing and integer expressions. TopSpeed Turbo NOx87 EMU x87 NOx87 EMU x87 =================== =================== NullTime : 1.04 1.05 N/A 1.05 1.05 N/A BenchTime : 60.04 60.03 60.09 60.03 Null loops : 3308 2514 3614 2775 Bench loops : 960 1206 422 467 LoopOverhead : 0.30 0.50 0.12 0.18 TotalTime : 59.74 59.53 59.97 59.85 Loops per second : 16.07 20.26 7.04 7.80 TopSpeed Pascal does extremely well in this which may suggest that TopSpeed's 4-byte INTEGER expressions are more efficient than Turbo's 4-byte LONGINTs. Store Benchmark =============== The Store Benchmark is a trivial test of the efficiency of a Pascal implementations file IO. This can be an extremely misleading benchmark, it may be affected by the form of buffering used, if any, also I/O checking of various forms. TopSpeed Turbo NOx87 EMU x87 NOx87 EMU x87 =================== =================== NullTime : 1.04 1.04 N/A 1.04 1.04 N/A BenchTime : 60.75 60.20 60.20 60.31 Null loops : 3407 2602 3712 3862 Bench loops : 40 34 69 59 LoopOverhead : 0.01 0.01 0.02 0.02 TotalTime : 60.74 60.19 60.18 60.29 Loops per second : 0.66 0.56 1.15 0.98 Turbo Pascal comes out better in this benchmark, however this might be due to buffering, checks the IO routines perform (or don't), and whether the IO routines are designed for typed files. I wasn't able to run the Turbo Benchmark on the 486, for some reason it never terminated. TrigLog Benchmark ================= The TrigLog benchmark tests the efficiency of a compilers trigonometric and logarithmic floating point functions. TopSpeed Turbo NOx87 EMU x87 NOx87 EMU x87 =================== =================== NullTime : 1.05 1.05 1.05 1.04 1.05 1.04 BenchTime : 60.30 60.03 60.09 60.31 60.75 60.03 Null loops : 3430 2479 2467 3716 2752 3703 Bench loops : 14 658 670 39 44 241 LoopOverhead : 0.00 0.28 0.29 0.01 0.02 0.07 TotalTime : 60.30 59.75 59.80 60.30 60.73 59.96 Loops per second : 0.23 11.01 11.20 0.65 0.72 4.02 Turbo Pascal does well in this benchmark, again suggesting that the non-standard 6-byte REALs are giving it an edge. Again TopSpeed Pascal wins under the coprocessor. Whetstone Benchmark =================== The Whetstone Benchmark is a synthetic benchmark for testing floating point performance. It contains a number of weighted floating point expressions and operations on arrays and records containing REALs. The Whetstone can give an indication of an applications performance in real-world floating- point intensive applications. TopSpeed Turbo NOx87 EMU x87 NOx87 EMU x87 =================== =================== NullTime : 1.05 1.05 1.04 1.04 1.04 1.05 BenchTime : 72.72 60.36 60.14 61.96 61.52 60.09 Null loops : 3399 2391 2560 3739 2780 4020 Bench loops : 5 148 167 13 14 82 LoopOverhead : 0.00 0.06 0.07 0.00 0.01 0.02 TotalTime : 72.72 60.30 60.07 61.96 61.51 60.07 Loops per second : 0.07 2.45 2.78 0.21 0.23 1.37 Turbo Pascal does well in this benchmark, again suggesting that the 6-byte REALs are giving it an edge. TopSpeed runs twice as fast under the coprocessor though. EXECUTABLE SIZE =============== One measure of a compilers abilities that is often quoted is the size of the resulting executable. While small executables are desirable, a small executable does not always indicate a better compiler. With small programs such as these benchmarks it is conceivable that a large percentage of the programs size is made up of routines from the run-time library. These routines may have been implemented differently for each compiler because each vendor may have slightly different goals. For example, implementing a Pascal run-time library for ISO conformance may result in a lager IO library than a simple DOS IO interface. For this reason you shouldn't assume that if a compiler generates a smaller executable from a small source file that it will generate a smaller executable for huge source files. The larger the amount of source code the more the size is reliant on the efficiency of the compiler and linker. Furthermore some libraries are implemented largely in assembler, whilst others are implemented in a high level language. This often has some size penalty particularly in small programs. TopSpeed Turbo ======== ===== ackerman.exe 19743 6160 dhry.exe 21274 8608 fbench.exe 23700 11632 fibonacc.exe 19649 5824 float.exe 19647 6288 gamm.exe 20991 8112 imath.exe 19633 5984 rmath.exe 19634 5856 savage.exe 20436 7072 sieve.exe 19691 6096 store.exe 22353 6416 tmath.exe 20113 6992 tscrn.exe 19671 5792 whet.exe 22979 10688 whetchk.exe 21395 10224 CONCLUSION: =========== These benchmarks illustrate 3 advantages that Turbo Pascal has over TopSpeed Pascal: the non-standard 6-byte REALs offer a performance advantage on machines which do not have a coprocessor and non-486DX machines; for small programs Turbo Pascal produces smaller programs; simple file IO appears to be faster. However TopSpeed Pascal uses standard floating point representations that match those used by the coprocessor. There are no restrictions as to which floating point representation you may use in your TopSpeed Pascal program. The TopSpeed emulator correctly uses x87 code when run on a 486DX. If you have a machine with a coprocessor, floating point programs may run up to 5 times faster with TopSpeed Pascal (See RealMath). There are a number of ways of controlling file IO in order to speed up operations. Although for small programs TopSpeed Pascal produces larger executables than Turbo, this is not always the case. We have had reports of programs larger than 400K under Turbo shrinking to 250K under TopSpeed! This suggests that the overhead is introduced by the run-time library. Turbo Pascal's apparent advantages are short lived. Most people are using larger machines now and applications are growing. The size difference for small programs is going to be of little concern to most users and developers, however program shrinkage for larger applications is still an issue. Most people running heavily numeric applications own a machine with a coprocessor. I may be biased, but I think that what little information one can safely gleen from these benchmarks shows TopSpeed Pascal to be the compiler of choice for most serious developers. If anybody wishes to provide substantiated statistics from real world applications which show Turbo Pascal against TopSpeed Pascal or Modula-2, I'd be happy to include them in this paper. Sean Wilson, Clarion Software 11 August 1992